On the Computation of Distances for Probabilistic Context-Free Grammars
نویسندگان
چکیده
Probabilistic context-free grammars (PCFGs) are used to define distributions over strings, and are powerful modelling tools in a number of areas, including natural language processing, software engineering, model checking, bio-informatics, and pattern recognition. A common important question is that of comparing the distributions generated or modelled by these grammars: this is done through checking language equivalence and computing distances. Two PCFGs are language equivalent if every string has identical probability with both grammars. This also means that the distance (whichever norm is used) is null. It is known that the language equivalence problem is interreducible with that of multiple ambiguity for context-free grammars, a long-standing open question. In this work, we prove that computing distances corresponds to solving undecidable questions: this is the case for the L1, L2 norm, the variation distance and the Kullback-Leibler divergence. Two more results are less negative: 1. The most probable string can be computed, and, 2. The Chebyshev distance (where the distance between two distributions is the maximum difference of probabilities over all strings) is interreducible with the language equivalence problem.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملIntersection for Weighted Formalisms
The paradigm of parsing as intersection has been used throughout the literature to obtain elegant and general solutions to numerous problems involving grammars and automata. The paradigm has its origins in (Bar-Hillel et al., 1964), where a general construction was used to prove closure of context-free languages under intersection with regular languages. It was pointed out by (Lang, 1994) that ...
متن کاملProceedings of the 9 th International Workshop Finite State Methods and Natural Language Processing
The paradigm of parsing as intersection has been used throughout the literature to obtain elegant and general solutions to numerous problems involving grammars and automata. The paradigm has its origins in (Bar-Hillel et al., 1964), where a general construction was used to prove closure of context-free languages under intersection with regular languages. It was pointed out by (Lang, 1994) that ...
متن کاملPrefix Probability for Probabilistic Synchronous Context-Free Grammars
We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.
متن کاملQuery Parsing Using Probabilistic Tree Grammars
The tree representation, using rhythm for defining the tree structure and pitch information for node labeling has proven to be effective in melodic similarity computation. In this paper we propose a solution representing melodies by tree grammars. For that, we infer a probabilistic context-free grammars for the melodies in a database, using their tree coding (with duration and pitch) and classi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1407.1513 شماره
صفحات -
تاریخ انتشار 2014